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I Abstract This paper proposes a novel approach to im- 
' age deblurring and digital zooming using sparse local 
, models of image appearance. These models, where small 

■ image patches are represented as linear combinations of 
, a few elements drawn from some large set (dictionary) 
' of candidates, have proven well adapted to several im- 
, age restoration tasks. A key to their success has been to 

learn dictionaries adapted to the reconstruction of small 

■ image patches. In contrast, recent works have proposed 
[ instead to learn dictionaries which are not only adapted 
' to data reconstruction, but also tuned for a specific 
i task. We introduce here such an approach to deblur- 

■ ring and digital zoom, using pairs of blurry /sharp (or 
. low-/high-resolution) images for training, as well as an 
' effective stochastic gradient algorithm for solving the 
. corresponding optimization task. Although this learn- 
ing problem is not convex, once the dictionaries have 

• been learned, the sharp/high-resolution image can be 
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recovered via convex optimization at test time. Exper- 
iments with synthetic and real data demonstrate the 
effectiveness of the proposed approach, leading to state- 
of-the-art performance for non-blind image deblurring 
and digital zoom. 

Keywords deblurring • super-resolution • dictionary 
learning • sparse coding • digital zoom 

1 Introduction 

With recent advances in sensor design, the quality of 
the signal output by digital reflex and hybrid/bridge 
cameras is remarkably high. Point-and-shoot cameras, 
however, remain susceptible to noise at high sensitiv- 
ity settings and/or low-light conditions, and this prob- 
lem is exacerbated for mobile phone cameras with their 
small lenses and sensor areas. Photographs taken with 
a long exposure time are less noisy but may be blurry 
due to movements in the scene or camera shake. Like- 
wise, although the image resolution of modern cameras 
keeps on increasing, there is a clear demand for high- 
quality digital zooming from amateur and professional 
photographers, whether they crop their family vacation 
pictures or use footage from camera phones in news- 
casts. Thus, the classical image restoration problems of 
denoising, deblurring, multi-frame super-resolution and 
digital zooming (also called single-image super-resolution) 
are still of acute and in fact growing importance, and 
they have received renewed attention lately with the 
emergence of computational photography (e.g., [8,12, 
16]). 

The image deblurring problem is naturally ill-posed: 
Indeed, perfect low-pass filters remove all high-frequen- 
cy information from images. They are non-invertible 
operators, and different sharp images can give rise to 
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the same blurry one. Thus, an appropriate image model 
is required to regularize the deblurring process. Several 
explieit priors for natural images have been proposed in 
the past for different tasks in image restoration. Early 
work relied on various smoothness assumptions, or im- 
age deeompositions on fixed bases sueh as wavelets [20] . 
More recent approaches include non-local means filter- 
ing [1], learned sparse models [6,28,19], piecewise lin- 
ear estimator [29], Gaussian scale mixtures [22], fields 
of experts [24] , kernel regression [26] , and block match- 
ing with 3D filtering (BM3D) [3]. Pairs of low-/high- 
resolution images have also been used as an implicit 
image prior in digital zooming tasks [11], and com- 
bining the exemplar-based approach with image self- 
similarities at different scales has recently led to im- 
pressive results [12]. 

Wc propose in this paper to build on several of these 
ideas with a new approach to non-blind image deblur- 
ring (the blur kernel is assumed to be fixed and known) 
and digital zooming. Like Freeman et al. [11], we use 
training pairs of blurry/sharp or low-/high-resolution 
image patches readily available for these tasks to learn 
our model parameters. We also exploit learned sparse 
local models of image appearance, as in [6,28], which 
have been known to be very effective for several im- 
age reconstruction tasks. Our method shares some ideas 
with the work of Yang et al. [28], but our formulation 
combines several novelties that improves the results: 

- Whereas the approach of [28] is purely generative 
(this model learns how to simultaneously reconstruct 
pairs of low- and high-resolution patches), our approach 
learns how to reconstruct a high-resolution patch given 
a low-resolution one. In essence, the difference is the 
same as between generative and discriminative models 
in machine learning. 

- Wc present a novel formulation for non-blind im- 
age deblurring and digital zooming, combining a linear 
predictor with dictionary learning, and show with ex- 
tensive experiments on both synthetic and real data 
that our approach is competitive with the state of the 
art for these two tasks. 

- We adapt the stochastic gradient descent of [17] 
for solving the corresponding learning problem allowing 
the use of large databases of training patches (typically 
several millions). 

Notation. We define for p > 1 the £p norm of 
a vector x in R™ as \\x\\p = Cl2iLi \x[i]\^y^^, where 
x[i] denotes the i-th coordinate of x. We denote the 
Frobenius norm of a matrix X in R™'^" by l|X]|i? = 

(Er=iE;=iix[^j]i^)^/^ 



2 Related Work 

2.1 Deblurring and Digital Zoom 

Blur is a common image degradation, and the literature 
on the subject is quite large (see, e.g., [5,8-10,16,26]). 
Most existing methods assume a shift-invariant blur op- 
erator such that a blurry image B can be modelled as 
the convolution of the sharp image S with a fixed blur 
kernel k: 

B = k*S + n, (1) 

where n is an additive noise, usually i.i.d. Gaussian with 
zero mean. This model, while often satisfactory, does 
not take into account the fact that blur due to defocus 
or rotational camera motion is not uniform [16]. But, 
at least locally, it is sufficient to describe many types of 
blurs. 

In the noiseless case when the filter is a known im- 
perfect low-pass filter — that is, there is no zero in its 
Fourier transform, the blurring operator is invertible 
and deblurring amounts to inverting the Fourier trans- 
form. However, noise is always present in natural im- 
ages, and even a small amount dominates the signal in 
high frequencies, leading to numerous artefacts. Rcg- 
ularization methods have been extensively studied to 
tackle this problem [14]. They usually impose smooth- 
ness constraints on the reconstructed images. The most 
recent and effective algorithms in this line of work usu- 
ally adopt a two-step approach [4,10,13]: first, a sim- 
ple regularized inversion of the blur is performed, then 
the resulting image is processed with classical denois- 
ing algorithms to remove artefacts. Various denoising 
methods have been used for this task: for instance, a 
Gaussian scale mixture model (GSM) [13], the shape- 
adaptive discrete cosine transform [10], or block match- 
ing with 3D-filtering kernel regression [4] . 

The digital zooming literature has seen in recent 
years the development of another line of research, fol- 
lowing the exemplar-based method introduced by Free- 
man et al. [11]. Correspondences between high- resolu- 
tion patches and low-resolution ones are learned by 
building a large database of such pairs. This idea has 
been successfully exploited by Glasner et al. [12], lead- 
ing to state-of-the-art results. Along the same line, but 
using sparse image representations instead, pairs of cor- 
responding patches are used by Yang et al. [28] to jointly 
learn high and low-resolution dictionaries. As shown 
in Section 3, the method we propose exploits these 
exemplar-based ideas as well, but in a significantly dif- 
ferent way. 
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2.2 Learned Sparse Representations 

Like several recent approaches to image restoration [6, 
28] , our method is based on the sparse decomposition of 
image patches. Using a dictionary matrix D = [di, . . . , dfc] 
in R"^^'^, a si gnal X in M™ is reconstructed as a linear 
combination of a few columns of D. called atoms or 
dictionary elements. In typical image processing appli- 
cations, m is relatively small, for instance m = 64 for 
image patches of size 8x8 pixels, and k can be larger 
than m, e.g., k — 256. We say that the dictionary D is 
well adapted to a vector x when there exists a sparse 
vector a in R*^ such that x can be approximated by the 
product Da. 

Exploiting these types of models usually requires a 
"good" dictionary. It can either be prespecified or de- 
signed by adapting its content to fit a given set of sig- 
nal examples. Choosing prespecified atoms is appealing: 
The theoretical properties of the corresponding dictio- 
naries can often be analysed, and, in many cases, it 
leads to fast algorithms for computing sparse repre- 
sentations. This is indeed the case for wavelets [20], 
curvelets, steerable wavelet filters, short-time Fourier 
transforms, etc. The success of the corresponding dic- 
tionaries in applications depends on how suitable they 
are to sparsely describe the relevant signals. 

Another approach consists of learning the dictionary 
on a set of signal examples. The sparse decomposition 
of a patch a; on a fixed dictionary D can be achieved by 
solving an optimization problem called Lasso in statis- 
tics [27] or basis pursuit in signal processing [2]: 

min ||a; - Dallo + A||q;||i, (2) 

where the code a in M'^ is the representation of x over 
the dictionary D, and A is a parameter for controlling 
the sparsity of the solution.^ Following an idea origi- 
nally introduced in the neuroscience community by 01- 
shausen and Field [21], Aharon et al. [6] have empir- 
ically shown that learning a dictionary D adapted to 
natural images could lead to better performance for 
image denoising than using off-the-shelf ones. For a 
database of n patches of size to, a dictionary is learned 
by solving the following optimization problem 

1 " 

min -y"\\xi-T)ai\\l + \\\a.,\\i, (3) 

where a;, is the i-th patch of the training set, and oti 
is its associated sparse code. To prevent the columns 

^ It is well known that £i regularization yields a sparse 
solution for cc, but there is no direct analytic link between 
the value of A and the corresponding effective sparsity that it 
yields. 



of D from being arbitrarily large (which would lead 
to arbitrarily small values of a), the dictionary D is 
constrained to belong to the set V of matrices in R™^'= 
whose columns have an £2 norm less than or equal to 
one. 

Several algorithms have been designed to address 
this problem. They either update D and the vectors cxi 
in a sequential way [6], or are based on stochastic ap- 
proximations [18,21]. 

2.3 Deblurring with Dictionaries 

Several methods using dictionaries for deblurring have 
been presented in recent years [28,29]. Yu et al. [29], 
while not learning a dictionary as presented in the pre- 
vious section, uses orthogonal basis obtained with prin- 
cipal component analysis (PC A). By "learning" several 
such dictionaries (one for each edge direction), and by 
choosing the best dictionary for each patch, the sharp 
patch can be reconstructed. 

In the pioneering work by Yang et al. [28], a pair of 
dictionaries (Db.Ds) is used, one dictionary for prepro- 
cesscd blurred patches and the other for sharp patches. 
The preprocessing consists in the concatenation of ori- 
ented high-pass filters (gradients and Laplacian filters). 
During training, Dfc and Ds are learned for representing 
simultaneously (with the same sparse code) the sharp 
patches with Ds and the preprocessed blurred patches 
with Dfc. At test time, given a new preprocessed blurry 
patch X, a sparse code a is obtained by decomposing x 
using Dfc, and ones hopes DsCi to be a good estimate 
of the unknown sharp patch. 

This method, while appealing by its simplicity, suf- 
fers from an asymmetry between training and testing: 
Whereas in the learning phase, both blurred and sharp 
patches are used to obtain the sparse codes, at test time 
the code is only computed using the blurry patches. Our 
method addresses this problem by a different training 
formulation. Moreover preprocessing the data has em- 
pirically not shown to be necessary. 

3 Proposed Approach 

We show in this section how to learn dictionaries ada- 
pted to the deblurring and digital zoom tasks. As in 
exemplar-based methods [11, 12,28], we are given a train- 
ing set of n pairs of patches (obtained from pairs of 
blurry/sharp images), that are used to estimate model 
parameters. Unlike the classical dictionary learning prob- 
lem of Eq. (3) which is unsupervised, our deblurring and 
digital zoom formulation is therefore supervised, trying 
to predict the sharp patches from the blurry ones. 
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To predict a sharp pixel value, it is necessary to 
observe neighbouring blurry pixels. Sharp patches and 
blurry patches may therefore have different sizes, which 
we denote respectively by mi, and m^, with mi, larger 
than TTis. During the test phase, we observe a test im- 
age B and try to estimate the underlying sharp image S 
according to Eq. (1), assuming of course that its blur is 
of the same nature as the one used during the training 
phase. The following sections present different formula- 
tions to recover an estimate of S. 

3.1 Linear Model 

Blurring is, at least locally, a linear operation resulting 
from the convolution of a sharp image with a filter. 
When the support of the blur kernel is small compared 
to the patch sizes and rrif,, one can assume a linear 
relation between the blurry and sharp patches. Thus, 
a simple approach to the deblurring problem consists 
of learning how to invert this linear transform with a 
simple ridge regression model. 

Training Step: A training set (b^, s^), i = 1, . . . , n 
of pairs of blurry/sharp patches is given. The training 
step amounts to finding the matrix W in that 
solves the following optimization problem: 

n 

min -^||s.-Wb,,||2 + /i||W||^, (4) 

z— 1 

where ||W||i? denotes the Frobenius norm of the ma- 
trix W, n is the number of training pairs of patches, 
and ^ is a regularization parameter, which prevents 
overfitting on the training set and ensures that the 
learning problem is well posed. When n is very large 
(several millions), overfitting is unlikely to occur and 
setting /X to a small value (e.g., /i — 10~^ in our exper- 
iments) leads to acceptable results in practice. For this 
reason, and for simplifying the notation, we drop the 
term /i||W|||, in the rest of the paper. 

Testing Step: The parameters W are now fixed, 
and we are given a noisy test image B, the goal being 
to recover a sharp estimate S. However, as mentioned 
in Section 2, the noise dominates the signal in high 
frequencies, and in practice the linear model, which ba- 
sically tries to invert the blur operator, leads to poor 
results despite the large amount of training data. Im- 
provements can be achieved using recent denoising al- 
gorithms, either by pre-processing B to remove some of 
its noise, and/or by post-processing the sharp estimate 
to remove artefacts. 

We now pre-process B and call B its denoised ver- 
sion, which is obtained with a denoising algorithm [19], 
and respectively denote by bj and the patches of B 



and S centered at the pixel i, using any indexing of the 
image pixels. Note that the patches are here different 
from the ones in the training set, even though we use 
for simplicity the same notation. We assume with our 
learned linear model that the relation « Wb^ holds 
for the patch indexed by i. According to this model, 
the problem of reconstructing the sharp image S can 
be written as: 

min — V||s,- Wbi||2, (5) 

where is the number of patches in the image S. By 
using such a local linear model, and since the patches 
overlap, each pixel of the image S admits as many pre- 
dictions as patches it belongs to. The solution of Eq. (5) 
is the average of the different predictions at each pixel, 
which is a classical way of aggregating estimates in 
patch-based methods [6] . 

This model is easy to optimize and to understand 
but has several limitations. First, small mistakes made 
during the denoising process can be amplified by the 
deblurring step. 

Second, when the blur kernel totally suppresses some 
of the high frequencies of the image, putting them to 
zero, one cannot recover them with a local linear model: 
in the Fourier domain it correspond to a multiplication 
of the nullified coefficient by a finite number. This is one 
of the motivations for introducing a nonlinear model 
based on sparse representations to overcome these lim- 
itations. 

3.2 Dictionary Learning Formulation 

In a recent paper, Yang et al. [28] have shown that 
learning multiple dictionaries to establish correspon- 
dences between low- and high-resolution image patches 
is an effective approach to digital zoom. Following this 
idea, we propose to learn a pair of dictionaries Ds in 
jjm^xfc g^j^jj j-)^ jjmtxfc reconstruct patterns that 
the linear model presented in the previous section can- 
not recover. 

Training step: Given again a training set (bi,Si), 
i = 1, . . . , n of pairs of blurry-noisy /sharp patches, we 
address 

1 " 

r.J^^ ^-^||s.-Wb,-D,a*(b„D,)||i (6) 

i—l 

where a*(bi, Df,) is the solution of the following sparse 
coding problem 

a*(b„Db) 4 argmin \\h, - T>ba\\l + \\\a\\i, (7) 
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which is unique and weh defined under a few reason- 
able assumptions on the dictionary Df, (see [18] and 
references therein for more details).^ The patch is 
a dcnoisicd version of b^. The matrices and 
are two dictionaries jointly learned such that for all i, 
Wbi -I- DsQ:*(bi, Df,) is a good estimator of the sharp 
patch Si. Summing two different predictors is a classi- 
cal way of combining two models. In this case, we are 
hoping that the addition of the dictionary term to the 
linear term will permit a better recovery of the high fre- 
quencies. The two models are optimized jointly and are 
not just an averaging of two independent predictors. 

Note that Ds does not need to be regularized in our 
formulation. We indeed assume that a large amount 
of training data is available and as a consequence our 
model does not suffer from ovcrfitting. 

Testing step: According to our model, and using 
the same notations as in Eq. (5), our estimate S at test 
time is achieved by solving the following optimization 
problem 

min — y lis, - Wb, - D,,a*(b„Dfc)||2, (8) 

where s; , bi , bi are respectively here the patches cen- 
tered at pixel i of the sharp image S, the blurry, noisy 
image B and the blurry, denoisied image B. 

The optimization problem defined in Eq. (6) is harder 
than the classical dictionary learning of Eq. (3) or the 
one formulated by Yang ct al. [28], but this formulation 
presents advantages. 

In the work of Yang et al. [28], the sparse coef- 
ficients a are obtained during the training phase by 
jointly decomposing blurry patches b; and sharp patches s^ 
onto two learned dictionaries and D^. Such a model 
aims to ensure that there always exists a sparse code a 
that both fits the patches b^ and s^. However, at test 
time, since the sharp patches are not available, the vec- 
tors ct can only be computed from blurry patches b^, 
and the fact that the resulting a should be good for 
the corresponding sharp patch Sj is not guaranteed. 

Our approach does not suffer from this issue since 
the sparse coefficients a are always obtained on blurry 
patches only, both during the training and testing phase. 
We learn the dictionaries Df, and Dg and the linear pre- 
dictor W such that s^ is well predicted given a patch b^. 
Whereas this solves the issue mentioned above, it leads 
to more challenging optimization problems than [28]. 
The optimization method we propose builds upon [17], 

^ We have empirically found for our deblurring and super- 
resolution tasks on natural image patches and our dictionaries 
that the solution of Eq.(7) was always unique. For different 
tasks or data, the possible non-uniqueness of the Lasso solu- 
tion could be an issue (see [17]). 



which provides a general framework for solving such 
dictionary learning problems. The method is presented 
briefly in Section 4. 

We have presented so far a framework adapted to 
the deblurring task, where we wanted to obtain a sharp 
image from a blurry one. The problem of digital zoom 
consists of increasing the resolution of an image, but 
can be formulated as a deblurring problem in a simple 
way: A low-resolution image can indeed be turned into 
a blurry high-resolution image with any interpolation 
technique, the task of digital zoom being then to de- 
blur this new image. The training pairs of images can 
be generated by downsampling high-resolution images. 
Note that the antialiasing filter applied during down- 
sampling and the choice of the interpolation method 
are important. We worked with the antialiasing from 
the Matlab function imresize. 

4 Optimization 

The formulation of Eq (6) for learning a pair of dic- 
tionaries Df, and Ds and a linear predictor W for the 
deblurring task is a large-scale learning problem, where 
many training samples (bi,Si) can easily be available. 
The main difficulty in the optimization comes from 
the terms Q;*(bi,Df,), which are defined as solutions 
to the sparse coding problem of Eq. (7). The vectors 
Q;*(bi,D;,) therefore depend on the dictionary Df, and 
are not differentiable with respect to it, preventing us 
from using a direct gradient descent method. 

However, despite these two drawbacks, it has been 
shown in [17] that such problems enjoy a few asymp- 
totic properties that make it possible to use stochastic 
gradient descent when the number of training samples is 
large. Assuming an infinite training set (b^, s^) that are 
i.i.d. samples drawn from some probability distribution, 
and under mild assumptions, we define the asymptotic 
cost function 

n 

/(Df„Ds,W)= lim -y||s,-Wb,-Dsa*(b,,D6)I|i 
= E(b,s)[||s-Wb-Dsa*(b,Df,)||2], 

(9) 

where (b, s) are random variables distributed accord- 
ing to the joint probability distribution of low/high- 
resolution patches. 

The optimization of cost functions that have the 
form of an expectation over a supposedly infinite train- 
ing set is usually tackled with stochastic gradient tech- 
niques (see [17, 18] and references therein), that are iter- 
ative procedures drawing randomly one element of the 
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training set at a time. Of course training sets are in 
practice finite, but we have empirically obtained good 
results by optimizing on a large training set of 10 mil- 
lions of training patches. This is indeed the approach 
proposed in [17] for such problems, from which the fol- 
lowing proposition can be derived. 

Proposition 1 [DiflFerentiability of /] Assume that 
the training data (b, s) admits a continuous probability 
density, and assume the same hypotheses on the dictio- 
nary Df, as in [17]. Then, f is differentiable and 

Vw/ = -E(b,s)[2(s - D.a-^ - Wb)b^], 

Vd,./ = -E(b.s)[2(s - T),a* - Wh)a*^], 

Vd„/ = -E(b,s)[2(b/3*^ - T>i,oc*f3*^ - Db/3*a*^)], 

(10) 

where a* denotes Q:*(b, Dt,), and 

(3\c = andl3\ = -(Df^DM)-'Df^(s-D,a*-Wb), 

(11) 

where A denotes the indices of the nonzero coefficients 
of OL* , for any vector u, the vector U/i contains the val- 
ues of the vector u corresponding to the indices A, and 
for any matrix U, the matrix Uyi contains the columns 
of\J corresponding to the indices A. 

Algorithm 1 presents our method for learning , W 
and Df). It is a stochastic gradient descent algorithm, 
which adapts [17] to our formulation. It draws randomly 
one element of the training set at each iteration, com- 
putes the terms inside the expectations of Eq. (10), and 
moves the parameters Ds,W, Db one step in these di- 
rections. 

Since D;, is constrained to be in the set defined in 
Eq. (3), an orthogonal projection on this set is required 
at each iteration of the algorithm. It is denoted by U-p. 

To improve the efficiency of the algorithm, we use a 
classical heuristic often referred to as : Instead of draw- 
ing a single pair of the training set at the same time, 
we draw 77 of them, e.g., r] = 500, compute 77 direc- 
tions given by Eq. (10), and move the model parameters 
D6,Ds,W in the average direction. This improves the 
stability of the stochastic gradient descent algorithm, 
and experimentally gives a faster convergence. Since 
our optimization problem is not convex, it requires a 
good initialization. We proceed as follows: (i) We learn 
a dictionary D;, using the unsupervised formulation of 
Eq. (3) with the software'^ accompanying [18] on the 
set of patches b^. (ii) We fix D;, and optimize Eq. (6) 

^ The SPAMS toolbox is an open-source software available 
at: fittp: //www. di.ens.fr/willow/ SPAMS/ 



Algorithm 1 Dictionary Learning for Deblurring and 

Digital Zoom 

Require: (bi,Si), i = l,...,n (training set); A,/i £ R (pa- 
rameters); Dj, £ P (initial "blurry" dictionary), Dg (initial 
"sharp" dictionary); T (number of iterations); to,p (learn- 
ing rate parameters for the stochastic gradient descent), 
for t = 1 to T do 

Draw (bt,St) from the training set. 

Sparse coding: compute a* = a*(bt,D(,). 

Compute the active set: A {j : oi*[j] /^ 0}. 

Compute /3* according to Eq. (11). 

Choose the learning rate pt . 

Update parameters: 

W ^ W pt(st - D,a* - Wbt)bf , 

Da ^ Da -h pt(st - DsQ* - Wbt)a*'^, 

Db ^ n-D [Di, -h Pt (b/S*"^ - Di,a*/3*^ - Db/3*a-^'^)] . 

end for 

return (Di,,Ds,W) (learned model parameters). 



with respect to W and Ds, which is a convex optimiza- 
tion problem. In experiments, this procedure provides 
us with a good initialization. 



5 Experiments 

We present here experimental results obtained with our 
method and comparisons with state-of-the-art meth- 
ods. In all our experiments, after an initialization step 
described in the previous section, we use the stochas- 
tic gradient descent algorithm with one pass over a 
database of approximately 10 millions of training patches, 
which are extracted from a set of natural images. All 
the images from this dataset are unrelated with the im- 
ages used for testing our method. Our implementation 
is coded in C++ and Matlab. Learning a dictionary 
takes usually a few hours on a recent computer, while 
testing an image is faster (less than one minute for most 
of our test images). 



5.1 Non-Blind Deblurring with Isotropic Kernels 

To compare our method for the non-blind deblurring 
task, we have chosen a classical set of images and types 
of blurs, which has been used in several recent im- 
age processing papers (see [29] and references therein) . 
Even though addressing such a synthetic non-blind de- 
blurring task of course slightly deviates from real restora- 
tion problems with digital cameras, it is still an active 
topic in the image processing community, and has in 
fact proven useful in the past, leading to high- impact 
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Table 1 Experiments settings for the non-blind deblurring. 



Exp. 


Blur kernel k 


noise a'' 


1 


9x9 uniform blur 


0.308 


2 


k{xi,X2) = 1/(1 +xi +xi) 


2 


3 


k(a;i,a-2) = + x'( + x'^) 


8 


4 


k = [1 4 6 4 1]'-' [1 4 6 4 l]/256 


49 


5 


Gaussian blur of variance at = 1 


25 


6 


Gaussian blur of variance at = 2 


25 



applications in astronomic imaging [25] for example (see 
Section 5.2). 

The different combinations of blurs and noises are 
detailed in Table 5.1, with the shape of the blur ker- 
nel and the variance of the noise (which is Gaussian 
and white). They are used in other papers and go from 
strong-blur/ weak- noise to weak-blur/strong- noise cases. 

For each blur level, we have generated pairs of blurry/ 
images from our training database, and learned dictio- 
naries of size k — 512 elements. We have observed that 
the results quality usually improves with the dictionary 
size, 512 being a good compromise between quality and 
computational cost. Since our database is large, the pa- 
rameters fi is always set to a negligible value, fj, = 10~^. 
The size of patches rrig and nib are respectively set 
to 7 and 11 for all experiments. The only parameter 
which should be carefully tuned to obtain good results 
is the regularization parameter A. Following [4,10,13], 
we have manually chosen a value of A via a rough grid 
search for each type of blur and used it for every image. 
We report quantitative results in Table 5.1 in terms of 
improvement in signal-to-noise ratio (ISNR),^ and com- 
pare our method to the classical Richardson-Lucy algo- 
rithm [23], and to recent state-of-the-art methods [4, 10, 
16]. A few values are missing in the table: these exper- 
iments were not done by the authors of the papers. We 
observe that our method is competitive or better than 
the state of the art in experiments 2,3,4,5,6, where 
the supports of the blur kernels are relatively small. On 
the contrary, our algorithm is significantly behind other 
approaches in the case 1, probably because our patches 
are too small compared to the kernel size. The simple 
linear model while not at the state of the art, is giv- 
ing surprisingly good results for most of the blurs. Its 
combination with the dictionaries shows a significant 
improvement, leading to state-of-the-art performances. 
Qualitative results are presented in Figures 1, 2 and 3. 



* Denoting by MSE the mean-squared-error for images 
whose intensities are between and 255, the PSNR is de- 
fined as PSNR = 101ogio(255^/MSE) and is measured in 
dB. A gain of IdB reduces the MSE by approximately 20%. 



^ I I I I 

Fig. 5 Anisotropic kernels from [16] used in our experiments. 

5.2 Astronomical Images 

Our method is not designed specifically for the restora- 
tion of natural images. It adapts itself to the training 
set and can so be applied on various data. This versa- 
^Idhty is illustrated here on astronomical imaging, which 
is a field where non-blind deblurring has had a major 
industrial impact. The experiment setting is based on a 
classical astronomical case. A star image has to be re- 
covered from a blurred and noisy version of it. The blur 
kernel is the Hubble Space Telescope kernel as given 
in [25]. The additive noise is Gaussian. The training set 
is constructed from several others star images. 

Figure 4 presents the results with several deblurring 
algorithm. Our method result is quantitatively better 
than the other algorithms: While the two algorithms 
adapted to natural images [4,15] gives a PSNR of 30.8 
and 31.3, our method gets 33.5. In particular, our al- 
gorithm manages to recover really high values on the 
brightest stars. This is not surprising, several of these 
algorithms use priors that do not fit well astronomical 
images, but it validates the capability of our method to 
adapt to various data. 

5.3 Non-Blind Deblurring with Anisotropic Kernels 

While deblurring isotropic blurs is sufficient in many 
applications, anisotropic blurring appears in practical 
cases, e.g., camera-shaking blur. To test our algorithm 
on this setting, we used the kernels from the database 
by Levin et al. [16]. The local nature of our algorithm 
makes computationally challenging the treatment of large 
blurs and so we only worked with downsampled versions 
of the proposed kernels (by a factor 2). The 8 kernels 
used are shown in Figure 5. White Gaussian noise of 
variance 2 is added to the blurry images before deblur- 
ring. We compare in Table 3 with the sparse-gradient- 
based algorithm from Levin et al. [15] which is, to the 
best of our knowledge, the one giving the state-of-the- 
art results for this type of kernels. 
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Table 2 Isotropic deblurring results in ISNR (PSNR improvement). For each image/experiment, the best result is in bold. 
Four values are missing: the results for this experiment were taken from [29] , who does not test on the exact same set of images 
than us. 





Exp. l|Exp. 2|Exp. 3|Exp. 4|Exp. 5|Exp .6 


Exp. l|Exp. 2|Exp. 3|Exp. 4|Exp. 5|Exp .6 




Cameraman 


Lena 


PSNR input image 


20.76 


22.35 


22.29 


24.7 


25.53 


23.44 


25.84 


27.57 


27.35 


29.00 


30.74 


28.97 


Richardson-Lucy [23j 


4.47 


5.53 


3.58 


0.49 


1.21 


1.04 


4.80 


5.29 


2.71 


0.02 


0.26 


0.53 


Sparse gradient [15] 


7.73 


6.89 


4.78 


2.24 


2.64 


2.70 


7.02 


2.83 


5.44 


4.06 


3.30 


3.33 


SA-DCT [10] 


8.55 


8.11 


6.33 


3.37 






7.79 


7.55 


6.10 


4.49 


3.56 


3.46 


BM3D [4] 


8.34 


8.19 


6.40 


3.34 


3.73 


3.83 


7.97 


7.95 


6.53 


4.81 


4.18 


4.12 


Linear 


3.34 


7.72 


6.00 


3.20 


3.47 


2.69 


3.58 


7.30 


5.82 


4.64 


3.89 


3.58 


Linear + Dictionary 


4.76 


8.35 


6.47 


3.57 


3.94 


3.35 


4.83 


7.79 


6.13 


5.16 


4.34 


4.17 




House 


Barbara 


PSNR input image 


24.11 


26.28 


26.10 


28.51 


30.16 


28.18 


22.49 


23.49 


23.35 


24.28 


25.02 


23.46 


Richardson-Lucy [23] 


6.46 


5.86 


3.68 


0.04 


0.25 


0.59 


2.26 


2.70 


1.13 


-0.06 


0.12 


0.02 


Sparse gradient [15] 


10.16 


8.03 


6.43 


4.09 


3.47 


3.92 


2.88 


6.87 


1.51 


0.57 


0.66 


1.11 


SA-DCT [10] 


10.5 


9.02 


7.74 


4.99 


4.14 


4.21 


4.79 


5.45 


2.54 


1.31 






Dabov et al. [4] 


10.85 


9.32 


8.14 


5.13 


4.79 


5.30 


5.86 


7.80 


3.94 


1.90 


3.17 


1.94 


Linear 


4.25 


8.90 


7.58 


5.22 


4.51 


4.26 


2.39 


7.18 


4.27 


1.86 


2.89 


1.56 


Linear -|- Dictionary 


6.99 


9.32 


7.71 


5.74 


4.98 


5.09 


2.65 


7.64 


4.59 


2.00 


3.11 


1.70 



Table 3 Anisotropic deblurring results in mean ISNR 
(PSNR improvement) over 5 images. The kernels used are 
downsampled versions of those from [16]. 



Kernel 


1 


2 


3 


4 


Sparse gradient [15] 


9.04 


6.91 


7.49 


10.67 


Ours 


10.67 


7.17 


9.02 


6.63 


Kernel 


5 


6 


7 


8 


Sparse gradient [15] 


8.64 


9.18 


11.15 


10.24 


Ours 


10.52 


10.03 


9.64 


7.75 



Our method does significantly worse than [15] on 
three of these kernels: there are the ones where the ker- 
nel is large and we think it is probably due to the local- 
ity of our predictor. For the 8 kernels, wc worked with 
patches of size 13 and it might be not sufficient for too 
big kernels. 



5.4 Digital Zoom 

Following the same experimental protocol than for the 
deblurring experiments, we have evaluated our method 
for the digital zooming task. The dictionary size is fc = 
512, and the patch sizes are mh = 11 and rus ~ 7. 
Digital zooming is usually done on good quality images, 
with a very small noise: for this reason we use a small 
regularization parameter A, which is set to 0.005. 

It is always difficult to evaluate quantitatively the 
results of digital zoom algorithms. Indeed, upsampling 
and downsampling methods are often subject to sub- 
pixel misalignments, which are visually imperceptible, 
but make important mean square error differences. More- 
over, the antialiasing filter that has to be applied during 



the downsampling is rarely detailed, making compar- 
isons difficult. For this experiment, we used the Matlab 
function imresize with a bicubic interpolation to create 
the low-resolution images. The choice of the antialias- 
ing, which allows to create the training set, is really 
important. With a too strong antialiasing our method 
might sharpen too much the images, while with a weak 
antialiasing it might not deblur enough. 

We compare quantitatively with the method from 
Yang et al. [28] that also uses dictionaries, proving the 
efficiency of the discriminative approach. The dictionar- 
ies sizes are the same as ours (512), and the parameter 
A is chosen on a validation set of images. This method 
works in two steps, first, it predicts a high-resolution 
image from a filtered version of the low resolution one 
using pairs of dictionaries, then, the image is cleaned 
using a backprojection. Wc compare the results at both 
steps with our method in Table 4. 

Our method outperforms the full method from Yang 
et al. [28] by a small margin. But their results ob- 
tained only with dictionaries are significatively worse 
than ours. The discriminative learning of the dictio- 
naries and the addition of the linear predictor improve 
greatly the results. Figure 6 compares our results with 
the ones of Yang et al. using one image from [28]. We 
have observed that both methods improve significantly 
upon the bicubic interpolation and gives similar re- 
sults (with the backprojection step for Yang et al. [28] 
method). 

We have also compared qualitatively our method 
with others works: In Figure 7, we present digital zoom- 
ing results (by a factor 4) obtained on one image from 
[7, 12]. Our results are in general slightly better visually 
than [7] (see the texture of the baby's hat for instance), 
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Fig. 1 Examples of deblurring and close-up for the case 2. First two lines, from top to bottom, left to right: original image, 
blurry image, Richardson-Lucy, sparse gradient [15], SA-DCT [10], our method. Last two lines: close-ups in the same order. 
Best seen by zooming on a computer screen. 



but slightly behind [12] in terms of sharpness of edges 
(e.g. the baby's mouth). On the other hand, Glasdner 
et al. [12] 's algorithm reconstructs sometimes structures 
not present in the original image (e.g., square edges in 
the baby eye). In textured areas, we perform as good 
as [12]. 



6 Conclusion 

In this paper, we have presented a new formulation for 
image deblurring and digital zooming using a super- 
vised formulation of dictionary learning combined with 
a linear predictor. With a stochastic gradient descent 
algorithm, our approach is efficient and allows the use 
of millions of training samples. Experiments on natural 
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Fig. 2 Examples of deblurring for the case 3. First two line, from top to bottom, left to right: original image, blurry image, 
Richardson-Lucy, sparse gradient [15], SA-DCT [10], our method. Last two lines: close-ups in the same order. Best seen by 
zooming on a computer screen. 



images show that our method is competitive with the 
state of the art for the non-bhnd deblurring and digital 
zooming tasks. Future work will consist of extending 
the approach to the blind deblurring problem, where 
a blur kernel has to be learned at the same time as 
the learned dictionaries, and exploiting self-similarities 
in images, which have proven to be very successful for 
digital zooming [12] and image denoising[19]. 
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Fig. 3 Examples of deblurring for tlie case 4. From top to bottom, left to right: original image, blurry image, Richardson-Lucy, 
sparse gradient [15], SA-DCT [10], our method. Best seen by zooming on a computer screen. 



Table 4 Digital zoom (by a factor 2) quantitative results in 
PSNR. We present two values for Yang et al. method: the 
first one is the result given by their dictionaries, the second 
one is obtained by adding a backprojection algorithm to the 
dictionaries. For each image, the best result is in bold. 





Cubic spline 


Yang et al. [28] 


Ours 


Lena 


31.91 


32.13 / 33.06 


33.31 


Girl 


31.44 


31.48 / 31.93 


32.00 


Flower 


38.48 


38.69 / 39.59 


39.92 
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